(Exponentiated) Stochastic Gradient Descent for L1 Constrained Problems
نویسنده
چکیده
This note is by Sham Kakade, Dean Foster, and Eyal Even-Dar. It is intended as an introductory piece on solving L1 constrained problems with online methods. Convex optimization problems with L1 constraints frequently underly solving such tasks as feature selection problems and obtaining sparse representations. This note shows that the exponentiated gradient algorithm (of Kivinen and Warmuth (1997)) when used as a stochastic gradient descent algorithm is quite effective as an optimization tool under general convex loss functions — requiring a number of gradient steps that is logarithmic in the number of dimensions under mild assumptions. In particular, for supervised learning problems in which we desire to approximately minimize some general convex loss (including the square, logistic, hinge, or absolute loss) in the presence of many irrelevant features, this algorithm is efficient — with a sample complexity that is only logarithmic in the total number of features and a computational complexity that is only linear in the total number of features (ignoring log factors).
منابع مشابه
Stochastic Gradient Descent Training for L1-regularized Log-linear Models with Cumulative Penalty
Stochastic gradient descent (SGD) uses approximate gradients estimated from subsets of the training data and updates the parameters in an online fashion. This learning framework is attractive because it often requires much less training time in practice than batch training algorithms. However, L1-regularization, which is becoming popular in natural language processing because of its ability to ...
متن کاملSeismic impedance inversion using l1-norm regularization and gradient descent methods
We consider numerical solution methods for seismic impedance inversion problems in this paper. The inversion process is ill-posed. To tackle the ill-posedness of the problem and take the sparsity of the reflectivity function into consideration, an l1 norm regularization model is established. In computation, a nonmonotone gradient descent method based on Rayleigh quotient for solving the minimiz...
متن کاملA Light Touch for Heavily Constrained SGD
Projected stochastic gradient descent (SGD) is often the default choice for large-scale optimization in machine learning, but requires a projection after each update. For heavily-constrained objectives, we propose an efficient extension of SGD that stays close to the feasible region while only applying constraints probabilistically at each iteration. Theoretical analysis shows a good trade-off ...
متن کاملGeneralization Error Bounds for Aggregation by Mirror Descent with Averaging
We consider the problem of constructing an aggregated estimator from a finite class of base functions which approximately minimizes a convex risk functional under the l1 constraint. For this purpose, we propose a stochastic procedure, the mirror descent, which performs gradient descent in the dual space. The generated estimates are additionally averaged in a recursive fashion with specific weig...
متن کاملExponentiated Gradient Algorithms for Conditional Random Fields and Max-Margin Markov Networks
Log-linear and maximum-margin models are two commonly-used methods in supervised machine learning, and are frequently used in structured prediction problems. Efficient learning of parameters in these models is therefore an important problem, and becomes a key factor when learning from very large data sets. This paper describes exponentiated gradient (EG) algorithms for training such models, whe...
متن کامل